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Abstract 

Exponential random graph models are a class of widely used exponential fam- 
ily models for social networks. The topological structure of an observed network 
is modeled by the relative prevalence of a set of local sub-graph configurations 
termed network statistics. One of the key tasks in the application of these mod- 
els is which network statistics to include in the model. This can be thought of 
as statistical model selection problem. This is a very challenging problem — the 
posterior distribution for each model is often termed "doubly intractable" since 
computation of the likelihood is rarely available, but also, the evidence of the 
posterior is, as usual, also intractable. We present a fully Bayesian model se- 
lection method based on a Markov chain Monte Carlo algorithm of Caimo and 
Friel (2011) which estimates the posterior probability for each competing model 
as well as a possible approach for computing the model evidence. 

1 Introduction 

Exponential random graph models are a powerful and flexible family of statistical 
models for networks which allows us to model network topologies without requiring 
any independence assumption between dyads (pairs of nodes). These models have 
been utilized extensively in the social science literature since they allow to statistically 
account for the complexity inherent in many network data. The basic assumption 
of these models is that the topological structure in an observed network y can be 
explained by the relative prevalence of a set of overlapping sub-graph configurations 
s(y) also called graph or network statistics (see Figure 1). 

Formally a random network Y consists of a set of n nodes and m dyads {Yy : i = 
1, . . . , n; j = 1, . . . ,n} where Yy = 1 if the pair is connected (full dyad), and 
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Yij = otherwise (empty dyad). Edges connecting a node to itself are not allowed 
so Ya = 0. The graph Y may be directed (digraph) or undirected depending on the 
nature of the relationships between the nodes. 

Exponential random graph models (ERGMs) are a particular class of discrete linear 
exponential families which represent the probability distribution of Y as 

P (y\e) = q -M= *MQ T s(y)} () 

where s(y) is a known vector of sufficient statistics computed on the network (or 
graph) (see Snijders et al. (2006) and Robins et al. (2007)) and 6 are model parame- 
ters describing the dependence of p(y\0) on the observed statistics s(y). Estimating 
ERGM parameters is a challenging task due to the intractability of the normalising 
constant z(0) and the issue of model degeneracy (see Handcock (2003) and Rinaldo 
et al. (2009)). 

An important problem in many applications is the choice of the most appropriate 
set of explanatory statistics network statistics s(y) to include in the model from a 
set of a priori plausible ones. In fact in many applications there is a need to classify 
different types of networks based on the relevance of a set of configurations with respect 
to others. 

From a Bayesian point of view, the model choice problem is transformed into a 
parameter estimation problem aiming at estimating the posterior probability of all 
models within the considered class of competing models. In order to account for 
the uncertainty concerning the model selection process, Bayesian Model Averaging 
(Hoeting et al., 1999) offers a coherent methodolody which consists in averaging over 
many different competing models. 

In the ERGM context, the intractability of the likelihood makes the use of stan- 
dard techniques quite challenging. The purpose of this paper is to present two new 
methods for Bayesian model selection for ERGMs. This article is structured as fol- 
lows. A brief overview of Bayesian model selection theory is given in Section 2. An 
across-model approach based on a trans-dimensional extension of the exchange algo- 
rithm of Caimo and Friel (2011) is presented in Section 3. The issue of the choice 
of suitable jump proposals is addressed by presenting an automatic reversible jump 
exchange algorithm involving an independence sampler based on a distribution fitting 
a parametric density approximation to the within-model posterior. This algorithm 
bears some similarity to that presented in Chapter 6 of Green et al. (2003). The sec- 
ond novel method is a within-model approach for computing the model evidence. This 
approach is based on the path sampling approximation for estimating the likelihood 
normalizing constant and it makes use of nonparametric density estimation procedures 
for approximating the posterior density of each competing model (Section 4). Three 
illustrations of how these new method perform in practice are give in Section 5. Some 
conclusions are outlined in Section 6. The Bergm package for R, provided some of the 
functions used in this paper and it is available on the CRAN package repository at 
http : //cran. r-project . org/web/packages/Bergm. 
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Figure 1: Some of the most used sub-graph configurations for undirected graphs (anal- 
ogous directed versions can be used for digraphs). 
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2 Overview of Bayesian model selection 



Bayesian model comparison is commonly performed by estimating posterior model 
probabilities. More precisely, suppose that the competing models can be enumerated 
and represented by the set {m h : h = 1, . . . , H}. Suppose data y is assumed to have 
been generated by model m h , the posterior distribution is: 

ta i \ P(y\0 h ,m h ) p(6 h \m h ) 
p(0 h \y,m h ) = — r 2 

p{y\m h ) 

where p(y\0h,mh) is the likelihood and p(0h\rrih) represents the prior distribution of 
the parameters of model m^. The model evidence (or marginal likelihood) for model 

p(y\m h ) = p(y\6 h ,m h ) p(6 h \m h ) d6 h (3) 

J Oh 

represents the probability of the data y given a certain model and is typically im- 
possible to carry out analytically. However, the model evidence is crucial for Bayesian 
model selection since it allows us to make statements about posterior model probabil- 
ities. Bayes' theorem can be written as 



p{y\m h ) p(m h ) 



p( m h\y) = ^ H , , — — — r- ( 4 ) 



Based on these posterior probabilities, pairwise comparison of models, and say, 
can be summarised by the posterior odds: 

p(m h \y) _ p{y\m h ) ^ p{m h ) 
p{m k \y) p(y\m k ) p{m k )' 

This equation reveals how the data y through the Bayes factor 

BF h „ = « (6) 
p{y\m k ) 

updates the prior odds 

p(m k ) 

to yield the posterior odds. 

By treating p(mh\y) as a measure of the uncertainty around of model a natural 
approach for model selection is to choose the most likely a posteriori, i.e. the model 
for which p{m,h\y) is the largest. 

Bayesian model averaging (Hoeting et al., 1999) provides a way of summarising 
model uncertainty in inference and prediction. After observing the data y is possible 
to predict a possible future outcome y* by calculating an average of the posterior 
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distributions under each of the models considered, weighted by their posterior model 
probability. : 

H 

p(y*\y) = ^2p(y*\m h , y)p{m h \y) (8) 

h=l 

where p(y*\rrih, y) represents the posterior prediction of y* according to model rrih and 
data y. 

As we said above model evidence is generally difficult to compute and exact solu- 
tions are known for a small class of distributions. Numerical integration methods are 
usually needed, either general methods such as Gaussian integration or a Monte Carlo 
method, or methods specialized to statistical problems such as the Laplace approxi- 
mation, Gibbs sampling or the EM algorithm. 

2.1 Computing Bayes factors 

Generally speaking there are two approaches for compunting Bayes factors: across- 
model and within-model estimation. The former strategies involve the use of an 
MCMC algorithm generating a single Markov chain which crosses the joint model 
and parameter space so as to sample from 

p(0 h , m h \y) oc p(y\O h , m h ) p(0 h \m h ) p(m h ). (9) 

One of the most popular approach used in this context of the reversible jump MCMC 
algorithm of Green (1995) which is briefly reviewed in Section 2.1.1. 

Within-model strategies focus on the posterior distribution (2) of each competing 
model mh separately aiming at estimating their model evidence (3) which can then 
be used to calculate Bayes factors (see for example Chib (1995), Chib and Jeliazkov 
(2001), Neal (2001), Friel and Pettitt (2008), and Friel and Wyse (2012), who present 
a review of these methods). 

2.1.1 Reversible jump MCMC 

The Reversible Jump MCMC (RJMCMC) algorithm is a flexible technique for model 
selection introduced by Green (1995) which allows simulation from target distributions 
on spaces of varying dimension. In the reversible jump algorithm, the Markov chain 
"jumps" between parameter subspaces (models) of differing dimensionality, thereby 
generating samples from the joint distribution of parameters and model indices. 

To implement the algorithm we consider a countable collection of candidate models, 
{mi : k — 1, . . . , K}, each having an associated vector of parameters 0i of dimension 
Di which typically varies across models. We would like use MCMC to sample from 
the joint posterior (9). 

In order to jump from (0 k ,m k ) to (0h,rrih), one may proceed by generating a ran- 
dom vector u from a distribution g and setting (Oh,mh) = fhj((0k,mk), u). Similarly 
to jump from (0 h ,m h ) to (0 k , m k ) we have (0 k , m k ) = f jh ((0 h ,m h ),u*) where u* is a 
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random vector from g* and fh k is some deterministic function. However reversibility is 
only guaranteed when the parameter transition function f k h is a diffeomorphism, that 
is, both a bijection and its differential invertible. A necessary condition for this to 
apply is the so-called "dimension matching": dim(6k) + dim(u) = dim(0h) + dim(u*) 
(where dim(-) stands for "dimension of"). In this case the acceptance probability can 
be written as: 

. f p(0 h ,m h \y) p(m h ->• m k ) g*{u*) 

mm < 1, — r^r—, r — r~v~ 

{ p{0 k ,m k \y) p(m k ->■ m h ) g(u) 

where pim^ — > m k ) is the probability of jumping from model rrih to model m k , and |J| 
is the Jacobian resulting from the transformation from ((6 k ,m k ), u) to ((^,m^), u*). 

Mixing is crucially affected by the choice of the parameter of the jump proposal 
distribution g and this is one of the fundamental difficulties that makes RJMCMC 
often hard to use in practice (Brooks et al., 2003). 




3 Reversible jump exchange algorithm 

In the ERGM context, RJMCMC techniques cannot be used straightforwardly because 
both the likelihood normalizing constant z(0) in (1) cannot be computed analytically. 
Here we present an implementation of an RJMCMC approach for ERGMs based on 
the extention of the exchange algorithm for exponential random graph models (Caimo 
and Friel, 2011). 

For each model mh, this algorithm allows sampling from the following augmented 
distribution: 

p(0 h , ?/, h \y, m h ) oc p{y\6 h , m h )p(0 h \m h )h(0' h \0 h , m h )p{y'\0' h , m h ) (11) 

where p(y\6 h ,rrih) and p(y , \0' h ,mh) are respectively the original likelihood defined on 
the observed data y and the augumented likelihood defined on simulated data y 1 ', 
p(0h\mh) is the parameter prior and h(6' h \6h, mh) is any arbitrary proposal distribution 
for & h . Marginalising (11) over & h and y 1 yields the posterior of interest p(6h\y,mh). 

Auxiliary variable methods for intractable likelihood models, such as the exchange 
algorithm, have not been used in a trans-dimensional setting before. In order to 
propose to move from (0 k ,m k ) to (6' h ,m' h ), the algorithm (11) can be extended to 
sample from: 

p(0' h , k ,m' h , m k , j/|y) oc p(y\0 k ,m k )p(O k \m k )p(m k )h(0' h , m' h \0 k , m k )p(y'\0' h , m' h ) 

(12) 

where p(y\0 k ,m k ) and p(y'\0 / h ,m' h ) are the two likelihood distributions for the data y 
under model m k and the auxiliary data y 1 under the competing model m' h respectively, 
p(0 k \m k ) and p(m k ) are the priors for the parameter k and the respective model m k 
and h(0' h ,m' h \0 k ,m k ) is some jump proposal distribution. Analogously as before, the 
marginal of (12) for 0' h and m' h is the distribution of interest (9). 
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The issue with this method is that tuning the jump proposals h(-) in a sensible 
way so as to get a reasonable mixing can be difficult and automatic choice of jump 
parameters (Brooks et al., 2003) does not apply in this context due to the double 
intractability of the likelihood distribution. 



3.1 Pilot-tuned RJ exchange algorithm 

We now consider nested models or models differing by at most one variable. In this 
case, the move from (0 k ,mk) to a larger model (0' k+1 ,m' k+1 ) such that dim(m' k+l ) = 
dim(mk)+l can be done by proposing the trasformation (8' k+1 , m' k+l ) = ((Ok, 9' k+1 ),mk+i) 
where the (k + l)-th parameter value 6' k+1 is generated from some distribution g k+1 
and then accepting the move with the following probability: 



a = mm 



l Qe k ,m k (l/) (lo' k+ „m' k+1 (y) P(9'k+i\ m 'k+i) P( m 'k+i) 1 h(m' k+l \m k ) 
' Qe k ,m k (y) Qe< ,m' h+1 (l/) P(0 k \m k ) p(m k ) g k+1 (6' k+1 ) h(m k \m' k+l ) 



(13) 

where qo k ,m k (y) indicates the unnormalized likelihood of p(y\O k , m k ) (and so forth for 
the other functions q(-))- The reverse move is similar and is accepted with probability 

or 1 . 

The jump within the same model m k is accepted with the following probability: 

a = min I 1 ge *- m *M qe '^<^ P( 6 'k\ m 'k) P( m 'k) g(gfc) 1 (||) 
' qe k ,m k (y) qe' k , m ' k (y')p(Ok\m k )p(m k )g(e' k '' 

3.2 Auto-RJ exchange algorithm 

Finding suitable proposals for the jump between models is an very challenging task 
and is vital in order to ensure adequate mixing of the trans-dimensional Markov chain. 
In practice, tuning the jump proposals of the pilot-tuned algorithm is very difficult 
without any information about the posterior density covariance structure. A possible 
approach would be to use an independence sampler which does not depend on the 
current state of the MCMC chain but fits a parametric density approximation to the 
within-model posterior distribution so as to have an acceptance rate as high as possible. 

In this spirit, we can propose to jump from (6 k ,m k ) to (0' h , m' h ) using the following 
jump proposals: 

K e L m 'h\ k,m k ) = w(e' h \m' h ) h(m' h \m k ) (15) 

where h(m' h \m k ) represents between-model jump proposal from model m k to model 
m' h and w(0' h \m' h ) is the within-model jump proposal for model m' h . As remarked 
above, the within-model proposals have to be tuned in a sensible way. Posterior 
density approximations such as standard distributions with parameters determined by 
the moments of a sample drawn from (12) can be used as within model proposals for 
each competing model. For example, w(6i\mi) can be a normal distribution Af(fii, Sj) 
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where fii and are the posterior mean and covariance estimates for each model m\. 
In our experience the choice of normal proposals appear to fit quite well in most of the 
examples we looked at. 

The algorithm can be therefore summarized in two steps: the first step (offline) is 
used to sample from the posterior (11) of each model mi and to estimate the parameters 
fii and of the within-model jump proposal; the second step (online) carries out the 
MCMC computation of (12). 

The algorithm can be written in the following concise way: 

OFFLINE RUN 

(0) Estimation of p(0i\y,mi) for I = 1, . . . , H 

i Set (ii = E(0i\y,mi) and £, = Cov^y.mi) 

ii Use w(&i\mi) ~ A/"(Az,£*) as within-model jump proposals, when 

PROPOSING TO JUMP TO MODEL m t 

ONLINE RUN 

(1.1) GlBBS UPDATE of {m' h , & h , y 1 ) 

i Propose m' h from the prior p(-) 

ii Propose 0' h with probability w(-\fX h ,^h) 
Hi Draw y 1 from p(-\0' h ,m' h ) 

(1.2) Accept the jump from (6 k ,m k ) to (0' h ,m' h ) with probability: 



min I 1 qe ^ mkKy > HU h> m h\y> PK^hV^h) PK' lb h) wyu k \^ k ,^ k) y ^ 





,™' h (y) p(0' h \m' h ) p{m' h ) w(0 k \p, k , S fc ) 




m' h (l/) P(0k\m k ) p(m k ) w(0' h \p, h , S ft ) 



4 Estimating model evidence 

In this section we present a within-model approach for estimating the evidence p(y) 
(For ease of reading, we will omit the conditioning on the model indicator mz).The aim 
is to provide a useful method for low-dimensional models to use as a "ground-truth" 
reference to compare with the reversible jump exchange algorithm. The method follows 
from noticing that for any parameter 0*, equation (2) implies that: 

rfrt^OTrigL EOT . (17) 

This is also the starting point for Chib's method for estimating the evidence (Chib, 
1995). Typically 6* is choosen as a point falling in the high posterior probability region 
so as to increase the accuracy of the estimate. To estimate (17), the calculation of 
the intractable likelihood normalizing constant z(0*) and an estimate of the posterior 
density p(0*\y) are required. 
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Estimating z(0*) via path sampling 

The first problem can be tackled using a path sampling approach. Consider introducing 
an auxiliary variable t 6 [0, 1]. We consider the following distribution: 

Taking logarithm and differentiating z(6*t) with respect to t yields basic identity: 

1 d 



^£e*p{(^)} 
v ' yey 

' ^[0* T s(y)]ex V {(8*t) T s(y)} 



z(on) 

v 1 yey 



[d* T s(y)} p(y\o*t) 



yey 



= E y{0H [0* T s(y)] . (19) 

where "E y \en denotes the expectation with respect to the sampling distribution p(y\0*t). 
Therefore integrating (19) from to 1 we have that: 



i 

log{^} = jE y{eH [0* T s(y))dt. 



o 

Now if we choose a discretization of the variable t such that to = < • • • < U < • • • < 
tj = 1, this leads to the following approximation: 

log (fg } M g (i<+ , _ tl) ( g^gfMi j^gUgfM ) , (20) 

where E tf |e* tj [0* T s(y)] is equal to 0* T E y | * t - [s(y)] i.e. the expected network statistic 
counts simulated from 0*ti multiplied by the constant 0*. Remember that z(0) is 

analytically available and it is equal to 2^) i.e. the number of possible graphs on the 
n nodes of the observed network. In terms of computation, E 2/ |0*tJ0* T s(y)] can be 
easily estimated using the same procedures used for simulating auxiliary data from 
the ERGM likelihood. Hence in (20) two types of error emerge: discretization of (4) 
and Monte Carlo error due to the simulation approximation of "E y \g* t . [0* T s(y)}. 

The path of tj's is important for the efficiency of the evidence estimate. For exam- 
ple, we can choose a path of the type £j = (l//) c where c is some tuning constant: for 
c = 1 we have equal spacing of the / points in the interval [0, 1] , for c > 1 we have 
that the t^s are chosen with high frequency close to and for < c < 1 we have that 
the Vs are chosen with high frequency close to 1. 
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Estimating p(6*\y) 

A sample from the posterior p(0\y) can be gathered (via the exchange algorithm, for 
example) and used to calculate a kernel density estimate of the posterior probability at 
the point 0*. In practice, because of the curse of dimensionality, this implies that the 
method cannot be used, for models with greater than 5 parameters. In this paper we 
used the fast and easy to use np package for R (Hayfield and Racine, 2008) to perform 
a nonparametric density estimation of the posterior p(0*\y). 




Figure 2: Path sampling: for each 0* we estimate z{0*) via path sampling using the 
expected network statistics simulated from some points 0*ti along the line connecting 
to 0\ 



5 Applications 

5.1 Gahuku-Gama system 

The Gahuku-Gama system (Read, 1954) of the Eastern Central Highlands of New 
Guinea was used by Hage and Harary (1984) to describe an alliance structure among 
16 sub-tribes of Eastern Central Highlands of New Guinea (Figure 3). The system has 
been split into two network: the "Gamaneg" graph for antagonistic ( "hina" ) relations 
and the "Gamapos" for alliance ("rova") relations. An important feature of these 
structures is the fact that the enemy of an enemy can be either a friend or an enemy. 

5.1.1 Gamaneg 

We first focus on the Gamaneg network by using the 3 competing models specified in 
Table 1 using the following network statistics: 
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Figure 3: Gahuku-Gama system graphs: Gamaneg (left) and Gamapos (right). 



edges J2i<j Va 
triangles E;<j<fc VikVikVij 
4-cycle Ei<j<Kfc ViiVjiVikyki 

We are interested to understand if the transitivity effect expressed by triad closure 
(triangle) and 4-cycle is a closed structure that can sustain mutual social monitoring 
and influence (Pattison and Robins, 2002). 



Model 


m\ 


y~ 


edj 


res 




Model 


m 2 


y~ 


edj 


res 


+ triangles 


Model 


m 3 


y~ 


edj 


,cs 


+ triangles + 4-cycle 



Table 1: Competing models. 



Both the pilot-tuned RJ and auto-RJ exchange algorithms were run for 100, 000 
iterations using very flat normal parameter priors p{6i\m{) ~ A/"(0, 100//) for each 
model mi where I\ is the identity matrix of size equal to the number of dimensions 
of model mi, uniform model priors and 3,000 iterations for the auxiliary network 
simulation. The proposal distributions of the pilot-tuned RJ has been empirically 
tuned so as to get reasonable acceptance rates for each competing model. The offline 
step of the auto-RJ consisted of gathering an approximate sample from p(0\y) and 
then estimating the posterior moments (ii and S; for each of the three models using 
the parallel ADS update procedure. The exchange algorithm was run for 1, 000 x Di 
iterations (discarding the first 100 x Di iterations as burn-in) where Di is the dimension 
of the l-th model using the population MCMC approach described in Caimo and Friel 
(2011). We tuned the parallel ADS move factors 7 so as to get a reasonable acceptance 
rate during the offline step of the estimation. The accuracy of the estimates fii and S/ 
depends on the number of iterations of the auto-RJ offline run. In this example, the 
above number of iterations 1, 000 x Di of has been empirically shown to be sufficient 
for each competing model m\. Tables 2 and 3 report the posterior parameter estimates 
of the model selected for the pilot-tuned RJ and auto-RJ. Figure 4 shows the results 
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from the pilot-tuned RJ: model posterior diagnostics plot and the parameter posterior 
diagnostics plot. Figure 5 shows the same plots from auto-RJ. Between-model and 
within-model acceptance rates (reported in Table 3) are calculated as the proportions 
of accepted moves from (6k, rrik) to model (0' h , m' h ) for each k : k ^ h and when k = h 
respectively. The mixing of the auto-RJ algorithm within each model is faster than the 
pilot-tuned RJ algorithm due to the good approximation to the posterior distribution. 
The pilot-tuned algorithm took about 24 minutes to complete the estimation and the 
auto-RJ took about 31 minutes (including the offline step). 





Pilot-tuned RJ 


Auto-RJ 


Parameter 


Post. Mean 


Post. Sd. 


Post. Mean 


Post. Sd. 


Model mi 


6>i (edge) 


-1.15 


0.21 


-1.15 


0.21 


Model m2 


0i (edge) 


-0.97 


0.36 


-0.96 


0.37 


02 (triangle) 


-0.31 


0.41 


-0.29 


0.37 


Model 7713 


0i (edge) 


-0.98 


0.51 


-1.15 


0.37 


02 (triangle) 


-0.76 


0.47 


-0.31 


0.42 


3 (4-cycle) 


-0.05 


0.12 


0.02 


0.17 



Table 2: Summary of posterior parameter estimates. 



Within-model 


Pilot-tuned RJ 


Auto-RJ 


Model 77i\ 


0.14 


0.62 


Model m 2 


0.11 


0.42 


Model 7713 


0.00 


0.48 


Between- model 


0.07 


0.04 



Table 3: Acceptance rates. 





Pilot-tuned RJ 


Auto-RJ 


BF12 


14.46 


21.68 


BFi t3 


1506.43 


1425.77 


p{mi\y) 


0.93 


0.95 


p{m 2 \y) 


0.06 


0.04 


p(m 3 \y) 


0.01 


0.01 



Table 4: Bayes factor and posterior model probability estimates. 
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MCMC output : posterior model probabilities 
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Iterations 



MCMC output for Model 1 : posterior parameter probabilities 



o 




100 



0-i (edges) Iterations Lag 



Figure 4: Pilot-tuned RJ exchange algorithm output: posterior model probabilities 
(top) and posterior parameter probabilities (bottom). 
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MCMC output : posterior model probabilities 





1 1 1 

4e+04 8e+04 



MCMC output for Model 1 : posterior parameter probabilities 
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Figure 5: Auto-RJ exchange algorithm output: posterior model probabilities (top) and 
posterior parameter probabilities (bottom). 
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In terms of the calculating the evidence based on path sampling, Figure 6 shows the 
behaviour of K y \o* t [6* T s(y)~\ for 50 equaly-spaced path points t{ from to 1. The larger 
the number of temperatures / and the number of simulated networks, the more precise 
the estimate of the likelihood normalizing constant and the longer the computing effort. 
In this example we estimated (19) using 100 path points and sampling 500 network 
statistics for each of them. In this case, this setup has been empirically shown to be 
sufficiently accurate. We set c to be equal to 1 for all the models.. However different 
choices for c do not seem to have a big influence on the estimation results if I is large 
enough. 



Figure 6: E[0 s{y)\ estimated from a ladder of 50 equally-spaced path points. 



A nonparametric density estimation of p(0\y) for each competing model was imple- 
mented using approximate posterior samples gathered from the output of the exchange 
algorithm. Bayes Factor estimates for different sample sizes (which are increasing with 
the number of model dimension) are reported in Table 5. The results are consistent 
with the ones obtained by RJ exchange algorithms displayed in Table 4. The evidence- 
based approach took about a few seconds to estimate model evidence for mi and m 2 
and about 6 minutes for model m 3 using the biggest sample sizes displayed in Table 4. 

According to the scale of Kass and Raftery (1995) there is positive/strong evidence 
in favor of model mi which is the one including the number of edges. Thus in this 
case the only strong effect of the antagonistic structure of the Gahuku-Gama tribes is 
represented by the low edge density. 



5.1.2 Gamapos 

In this example, we considered the same competing models of Table 1. In this case the 
pilot-tuned RJ exchange algorithm was infeasible to be used effectively as it turned out 
to be very sensitive to the choice of the jump proposal. We used the auto-RJ exchange 
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Sample 


sizes 




Model 


mi 


100 


500 


1,000 


5,000 


Model 


m 2 


150 


750 


1,500 


7, 500 


Model 


m 3 


200 


1,000 


2,000 


10,000 


BF12 




18.83 


18.72 


17.75 


19.09 


BFx,3 




1029.67 


1483.52 


1363.91 


1390.08 



Table 5: Bayes Factor estimates for increasing values of sample sizes used for the 
posterior density estimation. 



algorithm with the same setup of the previous example. The output from auto-RJ 
exchange algorithm is displayed in Figure 8 and the parameter posterior estimates in 
Table 6. 



Parameter Post. Mean Post. Sd. 



Model W3 (within-model acc. rate: 0.3) 



9i (edge) 


-2.41 


0.45 


9 2 (triangle) 


2.91 


0.71 


9 3 (4-cycle) 


-0.66 


0.22 


Model mi (within-model acc. 


rate: 0.64) 


01 (edge) 


-1.15 


0.20 


Model m 2 (within-model acc. 


rate: 0.3) 


01 (edge) 


-1.69 


0.35 


02 (triangle) 


0.48 


0.20 



Between-model acc. rate: 0.03 



Table 6: Summary of posterior parameter estimates and acceptance rates. 



We also calculated the evidence for each models following the same setup of the 
Gamaneg example. In Table 7 are reported the Bayes Factor estimates of the auto-RJ 
exchange algorithm and evidence-based method using the biggest sample sizes for the 
posterior density estimation of the previous example. 





Auto-RJ algorithm 


Evidence-based method 


BF 3A 


17.83 


19.31 


BF 3j2 


34.81 


32.82 



Table 7: Bayes factors estimates. 



In the Gamapos network the transitivity and the 4-cycle structure are important 
features of the network. The tendency to a low density of edges and 4-cycles expressed 
by the negative posterior mean of the first and third parameters is balanced by a 
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MCMC output : posterior model probabilities 




MCMC output for Model 3 : posterior parameter probabilities 




i — r 

-4.0 



n — r 

-3.0 



~i — r 

-2.0 



8, (edges) 



8 2 (triangle) 



-1.0 





-1.5 -1.0 -0.5 0.0 

6 3 (cycle4) 




I 1 1 1 1 T 

20000 40000 

Iterations 




\ 1 1 1 1 T 

20000 40000 



Iterations 




I 1 1 1 1 T 

20000 40000 

Iterations 




1 1 1 1 r 

20 40 60 80 100 
Lag 




"I 1 1 1 r 

20 40 60 80 100 
Lag 




1 1 1 1 r 

20 40 60 80 100 
Lag 



Figure 7: Auto-RJ exchange algorithm output: posterior model probabilities (top) and 
posterior parameter probabilities (bottom^ 



Model 1 



Model 2 



Model 3 



s - 



0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.C 

t, 1| 



0.0 0.2 0.4 0.6 0.8 1.0 



Figure 8: E[6* T s(y)} estimated from a ladder of 50 equally-spaced path points. 

propensity for local triangles which gives rise to the formation of small well-defined 
alliances. 



5.2 Collaboration between Lazega's lawyers 

The Lazega network data collected by Lazega (2001) and displayed in Figure 9 rep- 
resents the symmetrized collaboration relations between the 36 partners in a New 
England law firm, where the presence of an edge between two nodes indicates that 
both partners collaborate with the other. 

In this example we want to compare 4 models (Table 8) using the edges, geometri- 
cally weighted degrees and geometrically weighted edgewise shared partners (Snijders 
et al, 2006): 



geometrically weighted degree (gwd) e^ u Y^k=l — — e~^") fc j Dk{y) 
geometrically weighted edgewise e^ v Y^k=i — — e^")*" j EP k (y) 

shared partner (gwesp) 

where <p u = log(2), <\> v = log(2), Dk(y) is the number of pairs that have exactly 
k common neighbors and EPk(y) is the number of connected pairs with exactly k 
common neighbors. 

As happened in the previous example, the pilot-tuned RJ exchange algorithm 
proved to be ineffective due to the difficulty of the tuning problem. The auto-RJ 
exchange algorithm was run for 100, 000 iterations using the same flat normal priors of 
the previous examples and 20, 000 auxiliary iterations for network simulation. The of- 
fline run consisted of estimating fli and S/ for each of the 4 models by using 6, 000 x D\ 
main iterations (discarding the first 1,000 x Di iterations as burnin). The algorithm 
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Figure 9: Lazega's lawyers graph. 



Model 


nil 


y ~ 


edges 




Model 


m 2 


y ~ 


edges + 


gwesp(log(2)) 


Model 


m 3 


y ~ 


edges + 


gwesp(log(2)) + gwd(log(2)) 


Model 


M4 




edges + 


gwd(log(2)) 



Table 8: Competing models. 
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took about 1 hour and 50 minutes to complete the estimation, the results of which are 
displayed in Figure 10 and Table 9. 



Parameter 


Post. Mean Post. Sd. 


Model m 2 (within-model acc. 


rate: 0.24) 


0i (edge) 


-3.93 


0.33 


6 2 (gwesp(log(2))) 


1.15 


0.16 


Model m 3 (within-model acc. 


rate: 0.26) 


0i (edge) 


-4.54 


0.56 


9 2 (gwesp(log(2))) 


-1.39 


0.23 


6 3 (gwd(log(2))) 


0.79 


0.62 



Between-model acc. rate: 0.03 



Table 9: Summary of posterior parameter estimates and acceptance rates. 

The evidence-based algorithm was carried out using 200 path points from each of 
which we sampled 500 networks. The results are reported in Table 10. The algorithm 
took 25 seconds to estimate the evidence for model mi, 8 minutes for model m 2 , 9 
minutes for model m$, 1 minute for model 777.4. 





Auto-RJ algorithm 


Evidence-based method 


BF 2 ,i 


> 10 6 


> 10 6 




5.72 


4.65 


BF 2A 


> 10 6 


> 10 6 



Table 10: Bayes Factor estimates. 



Table 10 displays the Bayes Factor for the comparison between model m 2 (best 
model) against the others. There is positive evidence to reject model m 3 and very 
strong evidence to models mi and 777.4. 

We can therefore conclude that the low density effect expressed by the negative edge 
parameter combined with the positive transitivity effect expressed by the geometrically 
weighted edgewise partners parameter are strong structural features not depending on 
popularity effect expressed by the weighted degrees. These results are in agreement 
with the findings reported in the literature (see Snijders et al. (2006) and Hunter and 
Handcock (2006)). 

6 Discussion 

In this paper, we have presented two novel methods to Bayesian model selection for 
exponential random graph models. The first one is an across-model approach based 
on a trans-dimensional extention of the exchange algorithm for exponential random 
graph models of Caimo and Friel (2011). An independence sampler making use of a 
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MCMC output : posterior model probabilities 
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Model 3 




Oe+00 
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Iterations 



MCMC output for Model 2 : posterior parameter probabilities 
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Figure 10: Auto-RJ exchange algorithm output: posterior model probabilities (top) 
and posterior parameter probabilities (bottom). 
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Figure 11: E[0* T s(y)} estimated from a ladder of 50 equally-spaced path points. 

parametric approximation of the posterior is proposed in order to overcome the issue 
of tuning of the jump proposal distributions and increase within-model accepteptance 
rates. A within-model approach for estimating the model evidence relying on the path 
sampling approximation of the likelihood normalizing constant and posterior density 
estimation via nonparametric techniques is also proposed. Both methods have been 
illustrated by three examples. 
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